Constituent lloundary Parsing for Exanll)lo-lkised Maclhine Tr,'inslation
ثبت نشده
چکیده
This paper i)roposes an effective parsing nicthod for examlile-based machine transhltiOl~. In this method, an input string is parsed by the tOl)-down aplflication of linguistic patterns consisting o l variables and constituent boundaries. A constituent boundary is expressed by either a functional word or a l)art-of..speech bigram. When structural ambiguity occurs, the most plausible structure is selected usin b, tile total values of distance calculations in tile oxanll)le-basod Iraillework. Transfer-Driven Machine Translation (TDMT) achieves efficient aitd robust translation within the example-based framework by adopting this parsing method. Using bidirectional translation between Japanese and Vnglish> tile effectiveness of this method in TDMT is nlso shown. 1 I n t r o d u c t i o n I-xample-basod franieworks are increasingly being applied to machiilo translatioi/, since th0y c~.ill l)rovido efficient and robust processing (Nagao, 1984; Sate, 1991; Sumita, 1992; Furuse, 1992; Watanabe, 1992). However, in order to make tilt best use o1 the a(.lv:.lnlages of an example-based fl'amcwork, it is essential to effectively integrate an example-based method anti source language analysis. Unfortunately, whcll all exainl)lebased nletiiod ix combined with a SOUFC0 lnnguago analysis inelhod having cOlnl)lox l~r~illilliflr rules, pulling a heavy load eli translalion, the advai/lai;os of lhe example-based franiowork iilay l)e ruined. To achieve efficient and robnst processing by the exanii)lc-basod framework, a lot of sttldies have beell nlado for the pui])ose of combining source lal!gtiage analysis with all example-based method, lind of efficiently covering the analyzed source langilllge strtiCttlro by me;illS of trailsfcr knowledge (Grishman, 1992; Jollcs, 1992; McLean, 1992; Manlyama, 1992, 1993; Nirenburg 1993). One wily to reduce tilt load of source langua!,,c analysis ix to directly apply trallSl'cr knowledge to all input siring, which sinlultaneously executes both siruciinal parsing aiM transfer knowlc.dgo al)lHication through pattorll-il/atchii/g, l:'allerll-nlalchi~ig does liot rise grainillaticaI symbols such as "Notlil Pliraso", but uses surfi.ice words an(] non-granlmalical synlbols. Therefore, in patlern-matching, rule coml)otition is reduced, and linguistic structure is expressed in a simpler manner thall ill gramnmr-based parsing. Thus, pattern-nlatcifing achieves efficient 1)arsing. It is also useful in treating spoken language, which sometimes deviates from convcntion:ll grammar, while grammar-based p,'lrsing has difficulty treating ilnreslricle(l spoken I[ingllll,ge. This pal)Or proposes a constituom boundary parsing method based on paltorn-niatching, and shows its effeclivonoss for spoken langnago translation within the exaniple-I)asod framework. In otlr parsing method, aii inl)Ut string .is applied l inguistic patterns e×pressing some linguistic constitticnts and their bonnds-lrios, in a top-down f:.tshion. \Vhon structural anlbiguity occurs, tile most phlusi/)lo structure is selected rising the total vahios of dislanco calculations in t i l t example-based lrs-Illiowork. Shico the description of a linguistic ps-ittern is sinlplo, it is easy to update by adding f0etlback. A constiLuonl boundary ixusing method using nuitual i l l foi i l lat ion i~ l)roposed in (M,'lgerlflan 1990). This method accouilts for the unrestricted lls-ltLlra] langtlage and is efficient, l lowever, it tends to be illacctirate> and difficult, to ad(l feedback to, since it completely depends on st'ltistical information withoul, resort to a linguistic viewpoint. On the cont,ary> in order to achieve accurate parsing and Iransb'ition, our conslituent boundary parsing method implicitly incorporates grammatical information into p'ltterns, e.g. constituent boundary description by a i)art-of-sl)eech bigrani, and classification of i)ailerns according lo linb, uislic levels such s.ls simple sentence ,tlrld tlOtHI l)hrase. Tla l ls fer-Or ivel l Maehil lo TranslatiOll (T I )MT) ([:tlrtiso> 1992, 1994) uses tile COl/Stil.llont botlndary 1)a~sint ,, liielhod l)l'eSollto(l in this paper, as an alternative to glamliiar-based ali:.ilysis, aiKI lliakos the i)ost ilSe of the ex:lmplo-based framework. A bidirectional translation syslcnl between Jap,'lnesc lind English for dialogue sentences concerning international conference regislralions has been illlplenlented (Sobashima, 1994). l~xperimonts with the systonl have shown ollr parsing iiicthod I() t~ effcctive. Section 2 defines patterns expressed by variables and con.<;liluont boundaries. Section 3 OXl)lains a method for derivin{, possible English structures. Soelion 4 explain'4 structural disanibi,gnaliOti using tlislanco calculations in Iho o×anilflo-b,'lsed framework. Section 5 exphlins an example of Japanese sent0nee analysis using our consliluont boundary parsing method> and Section 6
منابع مشابه
Why is German Dependency Parsing More Reliable than Constituent Parsing?
In recent years, research in parsing has extended in several new directions. One of these directions is concerned with parsing languages other than English. Treebanks have become available for many European languages, but also for Arabic, Chinese, or Japanese. However, it was shown that parsing results on these treebanks depend on the types of treebank annotations used [ , ]. Another direction ...
متن کاملNeural Greedy Constituent Parsing with Dynamic Oracles
Dynamic oracle training has shown substantial improvements for dependency parsing in various settings, but has not been explored for constituent parsing. The present article introduces a dynamic oracle for transition-based constituent parsing. Experiments on the 9 languages of the SPMRL dataset show that a neural greedy parser with morphological features, trained with a dynamic oracle, leads to...
متن کاملConstituent Boundary Parsing for Example-Based Machine Translation
This paper i)roposes an effective parsing nicthod for examlile-based machine transhltiOl~. In this method, an input string is parsed by the tOl)-down aplflication of linguistic patterns consisting o l variables and constituent boundaries. A constituent boundary is expressed by either a functional word or a l)art-of..speech bigram. When structural ambiguity occurs, the most plausible structure i...
متن کاملThymoquinone, the main constituent of Nigella sativa, affects adenosine receptors in asthmatic guinea pigs
Objective(s): For determining the mechanism of anti-asthmatic effect of thymoquinone, this investigation evaluated the effect of thymoquinone in the presence of selective A2A and A2B adenosine receptor antagonists (ZM241385 and MRS1706, respectively). Materials and Methods: Seventy guinea pigs were randomly divided to 7 groups; control (C), sensitized with ovalbumin (S), sensitized groups pretr...
متن کاملCombine Constituent and Dependency Parsing via Reranking
This paper presents a reranking approach to combining constituent and dependency parsing, aimed at improving parsing performance on both sides. Most previous combination methods rely on complicated joint decoding to integrate graphand transition-based dependency models. Instead, our approach makes use of a high-performance probabilistic context free grammar (PCFG) model to output k-best candida...
متن کامل